09. Getting Stopwords from NLTK
Getting Stopwords from NLTK
Question:
Start Quiz:
 
            Solution:
INSTRUCTOR NOTE:
Depending on your setup, downloading the corpus with the GUI (like I do) can be slow and painful. Here's a stack overflow page about downloading it via the command line: http://stackoverflow.com/questions/5843817/programmatically-install-nltk-corpora-models-i-e-without-the-gui-downloader
Note: Version 3.1 of NLTK has a bug with obtaining and downloading the 'panlex_lite' corpus. While this is scheduled to be fixed in version 3.2, you can follow these steps to install this corpus in the meantime:
- 
           Use
           nltk.download('all', halt_on_error=False)to get all of the corpora except for the 'panlex_lite' corpus.
- 
           You should have a folder on your computer called "nltk_data" which holds all of the downloaded files referenced by
           nltk. (You might find it in your "/Users/ username /" folder.) Save the archived version of the corpus from this link into the "nltk_data/corpora" folder. Warning: The zip file is size 1.7 GB!
- Unzip the folder. You should have a file structure that looks like "nltk_data/corpora/panlex_lite/" which contains two files with the unarchived corpus data.
An update to the stopwords corpus in March 2016 updated the number of English stopwords: your answer should be 153 with the most recent corpus data.